112 research outputs found
Recommender systems fairness evaluation via generalized cross entropy
Fairness in recommender systems has been considered with respect
to sensitive attributes of users (e.g., gender, race) or items (e.g., revenue
in a multistakeholder setting). Regardless, the concept has been
commonly interpreted as some form of equality – i.e., the degree to
which the system is meeting the information needs of all its users in
an equal sense. In this paper, we argue that fairness in recommender
systems does not necessarily imply equality, but instead it should
consider a distribution of resources based on merits and needs.We
present a probabilistic framework based ongeneralized cross entropy
to evaluate fairness of recommender systems under this perspective,
wherewe showthat the proposed framework is flexible and explanatory
by allowing to incorporate domain knowledge (through an ideal
fair distribution) that can help to understand which item or user aspects
a recommendation algorithm is over- or under-representing.
Results on two real-world datasets show the merits of the proposed
evaluation framework both in terms of user and item fairnessThis work was supported in part by the Center for Intelligent Information
Retrieval and in part by project TIN2016-80630-P (MINECO
Counterfactual Reasoning for Bias Evaluation and Detection in a Fairness under Unawareness setting
Current AI regulations require discarding sensitive features (e.g., gender,
race, religion) in the algorithm's decision-making process to prevent unfair
outcomes. However, even without sensitive features in the training set,
algorithms can persist in discrimination. Indeed, when sensitive features are
omitted (fairness under unawareness), they could be inferred through non-linear
relations with the so called proxy features. In this work, we propose a way to
reveal the potential hidden bias of a machine learning model that can persist
even when sensitive features are discarded. This study shows that it is
possible to unveil whether the black-box predictor is still biased by
exploiting counterfactual reasoning. In detail, when the predictor provides a
negative classification outcome, our approach first builds counterfactual
examples for a discriminated user category to obtain a positive outcome. Then,
the same counterfactual samples feed an external classifier (that targets a
sensitive feature) that reveals whether the modifications to the user
characteristics needed for a positive outcome moved the individual to the
non-discriminated group. When this occurs, it could be a warning sign for
discriminatory behavior in the decision process. Furthermore, we leverage the
deviation of counterfactuals from the original sample to determine which
features are proxies of specific sensitive information. Our experiments show
that, even if the model is trained without sensitive features, it often suffers
discriminatory biases
Counterfactual Fair Opportunity: Measuring Decision Model Fairness with Counterfactual Reasoning
The increasing application of Artificial Intelligence and Machine Learning
models poses potential risks of unfair behavior and, in light of recent
regulations, has attracted the attention of the research community. Several
researchers focused on seeking new fairness definitions or developing
approaches to identify biased predictions. However, none try to exploit the
counterfactual space to this aim. In that direction, the methodology proposed
in this work aims to unveil unfair model behaviors using counterfactual
reasoning in the case of fairness under unawareness setting. A counterfactual
version of equal opportunity named counterfactual fair opportunity is defined
and two novel metrics that analyze the sensitive information of counterfactual
samples are introduced. Experimental results on three different datasets show
the efficacy of our methodologies and our metrics, disclosing the unfair
behavior of classic machine learning and debiasing models
A flexible framework for evaluating user and item fairness in recommender systems
This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s11257-020-09285-1One common characteristic of research works focused on fairness evaluation (in machine learning) is that they call for some form of parity (equality) either in treatment—meaning they ignore the information about users’ memberships in protected classes during training—or in impact—by enforcing proportional beneficial outcomes to users in different protected classes. In the recommender systems community, fairness has been studied with respect to both users’ and items’ memberships in protected classes defined by some sensitive attributes (e.g., gender or race for users, revenue in a multi-stakeholder setting for items). Again here, the concept has been commonly interpreted as some form of equality—i.e., the degree to which the system is meeting the information needs of all its users in an equal sense. In this work, we propose a probabilistic framework based on generalized cross entropy (GCE) to measure fairness of a given recommendation model. The framework comes with a suite of advantages: first, it allows the system designer to define and measure fairness for both users and items and can be applied to any classification task; second, it can incorporate various notions of fairness as it does not rely on specific and predefined probability distributions and they can be defined at design time; finally, in its design it uses a gain factor, which can be flexibly defined to contemplate different accuracy-related metrics to measure fairness upon decision-support metrics (e.g., precision, recall) or rank-based measures (e.g., NDCG, MAP). An experimental evaluation on four real-world datasets shows the nuances captured by our proposed metric regarding fairness on different user and item attributes, where nearest-neighbor recommenders tend to obtain good results under equality constraints. We observed that when the users are clustered based on both their interaction with the system and other sensitive attributes, such as age or gender, algorithms with similar performance values get different behaviors with respect to user fairness due to the different way they process data for each user clusterThe authors thank the reviewers for their thoughtful comments and suggestions. This
work was supported in part by the Ministerio de Ciencia, Innovacion y Universidades (Reference: 123496 Y. Deldjoo et al. PID2019-108965GB-I00) and in part by the Center for Intelligent Information Retrieval. Any opinions, findings and conclusions or
recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsor
A Topology-aware Analysis of Graph Collaborative Filtering
The successful integration of graph neural networks into recommender systems
(RSs) has led to a novel paradigm in collaborative filtering (CF), graph
collaborative filtering (graph CF). By representing user-item data as an
undirected, bipartite graph, graph CF utilizes short- and long-range
connections to extract collaborative signals that yield more accurate user
preferences than traditional CF methods. Although the recent literature
highlights the efficacy of various algorithmic strategies in graph CF, the
impact of datasets and their topological features on recommendation performance
is yet to be studied. To fill this gap, we propose a topology-aware analysis of
graph CF. In this study, we (i) take some widely-adopted recommendation
datasets and use them to generate a large set of synthetic sub-datasets through
two state-of-the-art graph sampling methods, (ii) measure eleven of their
classical and topological characteristics, and (iii) estimate the accuracy
calculated on the generated sub-datasets considering four popular and recent
graph-based RSs (i.e., LightGCN, DGCF, UltraGCN, and SVD-GCN). Finally, the
investigation presents an explanatory framework that reveals the linear
relationships between characteristics and accuracy measures. The results,
statistically validated under different graph sampling settings, confirm the
existence of solid dependencies between topological characteristics and
accuracy in the graph-based recommendation, offering a new perspective on how
to interpret graph CF
- …